Picture for Xin Jin

Xin Jin

EarlyTom: Early Token Compression Completes Fast Video Understanding

Add code
May 28, 2026
Viaarxiv icon

BigMac: Breaking the Pareto Frontier of Compute and Memory in Multimodal LLM Training

Add code
May 25, 2026
Viaarxiv icon

RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

Add code
May 20, 2026
Viaarxiv icon

GTA: Advancing Image-to-3D World Generation via Geometry Then Appearance Video Diffusion

Add code
May 13, 2026
Viaarxiv icon

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

Add code
May 06, 2026
Viaarxiv icon

LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)

Add code
May 06, 2026
Viaarxiv icon

Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures

Add code
Apr 20, 2026
Viaarxiv icon

OmniFood8K: Single-Image Nutrition Estimation via Hierarchical Frequency-Aligned Fusion

Add code
Apr 14, 2026
Viaarxiv icon

NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Add code
Apr 12, 2026
Viaarxiv icon

VGA-Bench: A Unified Benchmark and Multi-Model Framework for Video Aesthetics and Generation Quality Evaluation

Add code
Apr 11, 2026
Viaarxiv icon